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Resequencing and Analysis of Variation in the TCF7L2 
Gene in African Americans Suggests That SNP rs7903146 
Is the Causal Diabetes Susceptibility Variant 
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OBJECTIVE — Variation in the transcription factor 7-like 2 
(TCF7L2) locus is associated with type 2 diabetes across multiple 
ethnicities. The aim of this study was to elucidate which variant 
in TCF7L2 confers diabetes susceptibility in African Americans. 

RESEARCH DESIGN AND METHODS— Through the evalua- 
tion of tagging single nucleotide polymorphisms (SNPs), type 2 
diabetes susceptibility was limited to a 4.3-kb interval, which 
contains the YRI (African) linkage disequilibrium (LD) block 
containing rs7903146. To better define the relationship between 
type 2 diabetes risk and genetic variation we resequenced this 
4.3-kb region in 96 African American DNAs. Thirty-three novel 
and 13 known SNPs were identified: 20 with minor allele 
frequencies (MAF) >0.05 and 12 with MAF >0.10. These poly- 
morphisms and the previously identified DG10S478 microsatellite 
were evaluated in African American type 2 diabetic cases (n = 
1,033) and controls (n = 1,106). 

RESULTS — Variants identified from direct sequencing and data- 
bases were genotyped or imputed. Fifteen SNPs showed asso- 
ciation with type 2 diabetes (P < 0.05) with rs7903146 being the 
most significant (P = 6.32 X 10 ~ 6 ). Results of imputation, hap- 
lotype, and conditional analysis of SNPs were consistent with 
rs7903146 being the trait-defining SNP. Analysis of the DG10S478 
microsatellite, which is outside the 4.3-kb LD block, revealed con- 
sistent association of risk allele 8 with type 2 diabetes (odds ratio 
[OR] = 1.33; P = 0.022) as reported in European populations; how- 
ever, allele 16 (MAF = 0.016 cases and 0.032 controls) was strongly 
associated with reduced risk (OR = 0.39; P = 5.02 X 10" 5 ) in 
contrast with previous studies. 

CONCLUSIONS — In African Americans, these observations 
suggest that rs 7903 146 is the trait-defining polymorphism associ- 
ated with type 2 diabetes risk. Collectively, these results support 
ethnic differences in type 2 diabetes associations. Diabetes 
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Diabetes is estimated to affect nearly 24 million 
people in the United States. This significant 
disease burden translates to a major economic 
impact. Prevalence is observed disproportion- 
ately across ethnicities with the some of the highest rates 
observed in African Americans, i.e., 11.8% (1). Increased 
risk is likely to be multifactorial, resulting from the com- 
bination of shared cultural, environmental, and genetic 
factors. Although recent genome-wide association studies 
of type 2 diabetes in European-derived populations have 
revealed novel, reproducible susceptibility loci (2-11), few 
have been replicated in African Americans (12-14). 

An exception to this observation is the association of the 
transcription factor 7-like 2 (TCF7L2) gene with type 2 
diabetes in African (15) and African-derived populations, 
i.e., African Americans (12,16). TCF7L2 is a transcription 
factor involved in the Wnt signaling pathway (17). Al- 
though initial reports implicated TCF7L2 in the regulation 
of the glucagon gene in the L cells of the gut (18), more 
recent reports suggest involvement in insulin secretion 
(19) potentially through epigenetic mechanisms (20). The 
initial report of association between TCF7L2 and type 2 
diabetes in an Icelandic cohort identified a 64-kb linkage 
disequilibrium (LD) block of strong LD encompassing the 
intron 3 to intron 4 region of the gene (21) in this European 
population. Refinement of this signal in expanded pop- 
ulations revealed the strongest evidence of association 
with the single nucleotide polymorphism (SNP) rs7903146 
with a relative risk of 1.45-1.49 (15). Although it has been 
inferred from these studies that rs7903146 is most likely the 
causative variant, the large (64 kb) LD block in European- 
derived populations and the large number of variants in this 
region have made it challenging to definitively conclude that 
variation at this SNP confers susceptibility to type 2 di- 
abetes based solely upon genetic studies. 

We previously reported association of TCF7L2 variants 
and type 2 diabetes in a large African American case- 
control cohort (16). Of the SNPs evaluated, association 
was observed with rs7903146 and rs7901695 (admixture- 
adjusted additive P = 3.77 X 10" 6 and 0.0030, respec- 
tively) in a collection of 577 type 2 diabetic case subjects 
enriched for nephropathy and 596 controls. Given the 
evidence of association in our African American cohort 
(12,16), we sought to refine the genomic interval of TCF7L2 
associated with type 2 diabetes in African Americans and, 
using a comprehensive analysis of variation in TCF7L2, 
to define the genetic basis for type 2 diabetes suscep- 
tibility. 
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TABLE 1 

Characteristics of African American study participants 



Type 2 diabetic 
ESRD cases 



Controls 



Trait 




Mean ± SD 




Mean ± SD 


N 


1,033 




1,106 




Women (%) 


626 


60.6 


639 


57.8 


Age (years) 










At exam 


994 


61.5 ± 10.4 


881 


49.1 ± 11.9 


At type 2 diabetes 










diagnosis 


965 


41.4 ± 12.4 






At ESRD diagnosis 


960 


57.9 ± 10.9 






BMI (kg/m 2 ) 


996 


29.8 ± 7.1 


879 


30.0 ± 7.1 



^Number with data available. 
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RESEARCH DESIGN AND METHODS 

Study subjects. This study was conducted under Institutional Review Board 
approval from Wake Forest University School of Medicine. Identification, 
clinical characteristics, and recruitment of African American patients and 
controls have been previously described in detail (22). Briefly, 1,033 unrelated 
African American patients with type 2 diabetes were recruited from dialysis 
facilities. Type 2 diabetes was diagnosed in African Americans who reported 
developing type 2 diabetes after the age of 25 years and who did not receive 
only insulin therapy since diagnosis. In addition, cases had to have at least one 
of the following three criteria for inclusion: 1) type 2 diabetes diagnosed at 
least 5 years before initiating renal replacement therapy, 2) background or 
greater diabetic retinopathy, and/or 3) >100 mg/dL proteinuria on urinalysis in 
the absence of other causes of nephropathy. An additional 1,106 unrelated 
African Americans without a current diagnosis of diabetes or renal disease 
were recruited from the community and internal medicine clinics as controls. 
All type 2 diabetic cases and nondiabetic controls were born in North Caro- 
lina, South Carolina, Georgia, Tennessee, or Virginia. DNA extraction was 
performed using the PureGene system (Gentra Systems, Minneapolis, MN). 
Sequencing. The DNA screening panel was composed of 96 African American 
subjects: 48 type 2 diabetic cases and 48 controls. PCR primers were designed 
to independently amplify a 4.3-kb region of TCF7L2. Primer sequences are 
available on request. DNA sequencing reactions were performed using BigDye 
Terminator v. 1.1 Cycle Sequencing Kits and analyzed on the Applied Bio- 
systems 3730x1 DNA Analyzer (Applied Biosystems, Foster City, CA). Se- 
quencing reactions were performed on both DNA strands. Sequence alignment 
and polymorphism identification were performed using Sequencher 4.2 (Gene 
Codes, Ann Arbor, MI). All polymorphisms were validated through observa- 
tion on both strands. A search of the region sequenced was performed at 
dbSNP (www.ncbi.nlm.nih.gov) to record all previously identified poly- 
morphisms reported in the region. 

Genotyping. Genotyping of DG10S478 was performed by fragment length 
analysis on an ABI Prism DNA Analyzer 3700 with previously published primers 
(21) in a manner similar to that previously described (23). Fragment length 
was determined using ABI Prism GeneMapper software v3.0. Fifty-six duplicate 
samples were run for quality control purposes. SNP genotyping was performed 
on the iPlex MassARRAY genotyping platform (Sequenom, San Diego, CA). 
Blind duplicates and blanks were included for quality control and error check- 
ing. For all SNPs, the genotyping success rate was greater than 95%. 
Imputation. In preliminary analyses, SNPs in the TCF7L2 gene region ± 10 kb 
(C10:114689999-114926060) were imputed using data from HapMap phase II 
hybrid panel (1:1, YRLCEU; 108 SNPs) and the 1000 Genomes YRI Pilot 1 
dataset (497 SNPs) to circumvent limited coverage of the genetic diversity in 
this specific region by the Affymetrix 6.0 array. SNPs were imputed in 965 type 
2 diabetic end-stage renal disease (ESRD) cases and 1,029 controls with high- 
quality score (rsq-hat >0.3) using the software MACH (www.sph.umich.edu/ 
csg/abecasis) (24). The imputed most likely genotypes were then used for 
subsequent association tests. 

Four common SNPs identified from direct sequence analysis had minor 
allele frequencies (MAF) >0.05 and failed assay design on the Sequenom 
platform. With the use of the resequencing genotype data obtained from the 96 
African American samples as reference, these SNPs were imputed in the 
remaining 2,043 samples with high-quality score (rsq-hat >0.5) using the 
software MACH (www.sph.umich.edu/csg/abecasis) (24). The imputed most 
likely genotypes were then used for subsequent association tests. 
Analysis. SNPs were tested for departure from Hardy- Weinberg equilibrium 
(HWE) using an exact test of HWE proportions for the combined group of cases 



FIG. 1. Regional association plot for TCF7L2 ±10 kb (C10:114689999- 
114926060). All SNPs genotyped on the Affy 6.0 array are plotted with 
their -logi 0 P values of association with type 2 diabetes versus the 
genomic position (National Center for Biotechnology Information Build 
36.1). The TCF7L2 gene position was taken from the University of 
California, Santa Cruz genome browser (green), and the core region of 
association (C10:114744078-114748339) analyzed by direct sequence 
analysis is depicted in red. The most significantly associated SNP from 
the array is depicted as a blue diamond with its correlated proxies (red = 
r 2 2> 0.80; orange = 0.50 ^ r 2 > 0.80). SNP rs7903146 and microsatellite 
DG10S478, depicted as gray circles, were typed independently in the 
same set of samples. Estimated recombination rates from HapMap are 
plotted in the background to depict the LD structure in the region. 
(A high-quality color representation of this figure is available in the 
online issue.) 



and controls and then for cases only and controls only (25). Those SNPs out of 
HWE were noted but still included for the genotypic analysis. Haplotype block 
structure was established using Haploview 4.1 (26), denning blocks using the 
method from Gabriel et al. (27). 

Unadjusted measures of LD and association were assessed using the soft- 
ware SNPGWA (http://www.phs.wfubmc.edu) (28). SNPGWA computes LD 
statistics, D' and r 2 , for each pair of tandem SNPs. SNPGWA also performs 
multiple tests of association including the overall two-degree of freedom test 
(genotype), dominant model, recessive model, additive model (Cochran- 
Armitage trend test), and the corresponding lack of fit to the additive model. 
Odds ratios, 95% confidence intervals, and P values were computed for each 
model of association. Population attributable risk (PAR) was calculated as 
(X - \yx. Assuming a log additive model, X = (1 - ff + 2/(1 - f)y + fy 2 
where 7 is the estimated odds ratio (OR) and / is the risk allele frequency. 
DG10S478 was converted to a biallelic marker for analysis. 

Ancestry estimates were determined from 70 biallelic admixture informative 
markers (AIMs) as previously described (16,29). Briefly, AIMs were selected to 
maximize European and African allele frequency differences and sample all 
non-acrocentric arms of the autosomal genome. Reference population allele 
frequencies were derived by genotyping 44 African (Yoruba from Ibadan, 
Nigeria) and 39 European Americans. Individual ancestral proportions were 
generated for each subject using FRAPPE (30), an expectation-maximization 
algorithm, under a two-population model. The individual ancestral estimates 
were used as covariates in the association analyses. 

Conditional haplotype analysis. To test whether SNPs were independently 
associated with type 2 diabetes, we performed an omnibus test for the hap- 
lotype association using PLINK (31). We further adjusted the omnibus test by 
controlling for one of the SNPs at a time. An insignificant conditional test 
suggests that the conditioned SNP is sufficient to explain the haplotype as- 
sociation and there is a single, rather than multiple, association signals at the 
haplotype. 



RESULTS 

Population characteristics. Characteristics of the African 
American case-control populations are shown in Table 1. 
Controls were significantly younger than type 2 diabetic 
cases (P < 0.0001), although they were significantly older 
than the mean age at type 2 diabetes diagnosis in the cases 
(P < 0.0001). BMI was not significantly different between 
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TABLE 2 

Single SNP genotypic association results for SNPs in the TCF7L2 gene showing association with type 2 diabetes ESRD 



MAF 





Chromosome 




Cases 


Controls 




Additive 


Marker 


position (bp) 


Alleles 


(rc = 982) 


(n = 1,039) 


OR (95% CI) 


P value 


r<s707Q71 1* 


1 1 47^778 


Vjr/-rv 






1 00 CO 88 11 ^ 
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xv Vjr 
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0 ^ 
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loll IWlOiU 


1 1 4.74.0^Q7 


r/T 




0 9R 


0 8^ ffl 74. 0 Q8^ 
u.o?j \vj. i L ±— vj.yoj 


VJ.VJiuO 


rs 177473241 


114742743 


T/C 


0.08 


0.07 


1.26 (0.99-1.60) 


0.058 


rs4132115 


114745736 


G/T 


0.19 


0.15 


1.23 (1.04-1.46) 


0.014 


rs4506565 


114746281 


A/T 


0.51 


0.47 


1.15 (1.01-1.30) 


0.030 


rs7068741 


114746498 


C/T 


0.19 


0.15 


1.24 (1.05-1.46) 


0.012 


rs7069007 


114746525 


G/C 


0.14 


0.11 


1.23 (1.02-1.48) 


0.033 


rs7903146 


114748589 


C/T 


0.35 


0.29 


1.35 (1.18-1.54) 


1.76 X 10" 


rslll96187 


114749685 


G/A 


0.06 


0.05 


1.29 (0.97-1.71) 


0.082 


rs7092484 


114751173 


G/A 


0.28 


0.26 


1.07 (0.93-1.23) 


0.35 


rsl2098651* 


114751959 


G/A 


0.24 


0.22 


1.09 (0.94-1.26) 


0.25 


rs6585198 


114752477 


A/G 


0.18 


0.21 


0.85 (0.73-1.00) 


0.051 



^Inconsistent with HWE in cases. "{"Inconsistent with HWE in controls. 



cases and controls (P = 0.49). Similar proportions of 
women were present in cases and controls (61% and 58%, 
respectively). FRAPPE (30) analysis of AIM genotypes 
estimated the mean proportion of African ancestry overall 
was 0.79 ± 0.12 and differed significantly (P < 0.0001) 
between type 2 diabetic cases and controls (0.80 ± 0.12 
and 0.78 ± 0.12, respectively). Therefore, all results are 
presented with adjustment for admixture. 
Refinement of the type 2 diabetes associated genomic 
interval. A preliminary analysis assessed association with 
type 2 diabetes for 59 SNPs across the entire 216-kb 
TCF7L2 gene ±10 kb from the Affymetrix 6.0 array-based 
analysis of 965 type 2 diabetic ESRD cases and 1,029 
controls (Fig. 1). With the use of the Tagger program of 
Haploview (26), 43 of the 59 SNPs from the Affy array were 
available in the HapMap YRI dataset and captured com- 
mon variation at 121 SNPs (MAF >0.05; aggressive tag- 
ging algorithm) with a mean r 2 = 0.73. Notably, the SNP 
of greatest interest, rs7903146, is not typed on the Affy- 
metrix 6.0 array. This SNP is located in a genomic interval 
that is not tagged well (max r 2 = 0.45), resulting in only 
nominal evidence of association in the Affymetrix 6.0 
analysis. To circumvent limited coverage of the genetic 
diversity in this specific region, imputation was used. Us- 
ing data from HapMap phase II hybrid panel (1:1, YRI: 
CEU), 108 SNPs were imputed across the TCF7L2 gene 
± 10 kb (Supplementary Fig. 1). As a result, no variants 
were identified with the same magnitude of significance 
as rs 7903 146. In a separate analysis, 497 SNPs were 
imputed from the 1000 Genomes YRI Pilot 1 dataset 
(Supplementary Fig. 2). As a result, two variants were 
identified (rs33998771, chrlO: 114740378) proximal to 
our region with significant P values (3.58 X 10 ~ 5 and 
3.29 X 10" 5 , respectively) similar to that of rs7903146. To 
test whether these SNPs (rs33998771, chrlO: 114740378, 
rs 7903 146) were independently associated with type 2 di- 
abetes, we performed an omnibus test for the haplotype 
association controlling for one SNP at a time. Four com- 
mon haplotypes (ACT, ATT, TTT, and TTC) accounted for 
99.9% of all haplotypes. These haplotype frequencies were 
significantly different between type 2 diabetic cases and 
controls (0.042, 0.019, 0.268, and 0.672, respectively for 



cases; 0.020, 0.016, 0.229, and 0.735, respectively for con- 
trols, omnibus test P = 5.4 X 10~ 6 ), with the haplotypes 
TTC and ACT strongly associated with protection and risk 
for type 2 diabetes, respectively (P < 0.0001). Omnibus 
analysis revealed that the haplotype association was sig- 
nificantly reduced after adjusting for rs7903146 (P = 0.023), 
whereas strong association remained by conditioning 
rs33998771 or chrlO: 114740378 (P = 0.0006 and 0.002, re- 
spectively), suggesting that rs7903146 alone explained 
most of the haplotype association. 

These results lead us to focus on a single region span- 
ning — 16.5 kb. Fourteen additional SNPs were genotyped 
in an expanded set of type 2 diabetic cases and controls 
within this interval and analyzed for association as sum- 
marized in Table 2. The core region of association was 
between SNPs rs4132115 and rs7903146 (admixture- 
adjusted additive P values ranging from 0.012 to 2.38 X 
10~ 6 ). This region encompasses a 4.3-kb LD block in the 
YRI population (HapMap Phase II YRI data), which is 
bounded by the two SNPs rs7901695 and the previously 
associated rs7903146 (16). 

In addition, the previously associated (21) microsatellite 
marker DG10S478, which lies 38 kb distal to and outside of 
this LD block, was typed, and the results of the analysis are 



TABLE 3 

DG10S478 allelic association with type 2 diabetes ESRD in 
African Americans 



Frequency 





Cases 


Controls 






Allele 


(n = 1,910) 


(n = 1,972) 


OR 


P value 


-8 


0.00052 


0.0010 


0.52 


0.59 


-4 


0.0063 


0.012 


0.47 


0.043 


0 


0.72 


0.73 


0.91 


0.21 


4 


0.11 


0.11 


1.14 


0.20 


8 


0.081 


0.064 


1.33 


0.022 


12 


0.059 


0.049 


1.27 


0.10 


16 


0.016 


0.032 


0.39 


5.02 X 10" 5 


20 


0.0052 


0.0056 


0.94 


0.89 
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TABLE 4 

Single SNP genotypic association results for SNPs identified by direct sequence analysis in the TCF7L2 gene showing association with 
type 2 diabetes ESRD 

Variant ID MAF 

Cases Controls Additive 



rs No. 


Sequence ID 


Position 


Alleles 


(n = 1,033) 


(n = 1,106) 


OR (95% CI) 


P value 
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1.14 ^U.t/U — 1.44J 


0 98 




IVS3 


+42112 


1 1 474^488 

1 j.'-t i '-tO'-t\JO 


CI— 


0 00 
u.uu 


0 00 
u.uu 








IVS3 


+42245* 


1 1 474^01 

1 It: 1 lOUUl 














IVS3 


+42428* 


1 1 474^784 












I&O^tO'rt i i OO | 


IVS3 


+42434 


1 1 474^7Q0 

X 14: ( 40 ( V\J 


err 
\ji i 


0 1 1 

U. 1 1 


0 10 

U. 1U 


1 1 4 CO Q4 1 . 

1.14 \\J.V±—t.OVJ 


0 1Q 

U. to 




IVS3 


+42705 


1 14744081 


TIC 


0 40 

U.4U 


0 ^4 


1 ^0 n 14-1 48. 

t.O\J ^1.14 — 1.40J 


8 ^4 Y 10~ 5 

O.Q4 /\ 1U 


is i L/uiuyu 


IVS3 


+42722 


1 1 4744078 

X It: I t4:U I O 


TIC 


0 ^0 

\J.O\J 


0 47 

U.4 ( 


i ifi n o^-i ^9. 

1.1U ^l.UcJ — t.Odj 


0 01 7 

U.Ul ( 




IVS3 


+432351 


1 14744f,Q1 

X 14 ( lUt/X 


CIA 


0 18 

U. lO 


0 1 ^ 

u. ±o 


1 9fi n 07 1 4Q . 


0 00^ 

U.UU?JO 


IbO?Jlt70UL)0 | 


IVS3 


+43418 


1 1 4744774 

X 14 ( L t L t 1 1 4 


TIC 


0 40 

U.4U 


0 ^4 

U.04 


1 Qi n It 1 48^ 
l.Ol V^l. tO— 1.40J 


9 QI Y 10 -5 

£j.Vt /N 1U 




IVS3 


+43487* 


1 1 474484^ 

114/ 44040 














IVS3 


+43552 


1 1 4744Q08 

1 14 I 44t/UO 


C/T 
\ji i 


0 09 


0 01 

U.Ul 


1 80 n OA ^ i n 

l.OU ^l.U4 — U.11J 


0 O^fi 




IVS3 


+43592 


1 1 4744Q48 

1141 ttiJtO 


C/T 
\ji i 


0 01 

U.U1 


0 01 

U.Ul 


1 fi4 TO 8fi-^ 1 1 ^ 

1.U4 ^u.ou — U.11J 


0 1^ 
u. to 


1 5i 1 Olj 1 1 ?J 


IVS3 


+44130 


1 1 474^,488 

1 14 I 4(J40U 


C/T 
vjr/ i 


0 18 
u. lO 


0 1 ^ 
u. to 


1 9Q Tl 0Q-1 w\ 

t.Cit? ^l.Ut/ — t.OOJ 


0 00^4 




IVS4 


-44095 


1 1 474^fi7Q 

1 14 ( 4UU ( tj 


C/G 


0.05 


0.05 


0 Q4 TO 71-1 9R\ 

U.t/4 v.U. ( 1 — t.LuO J 


0.68 




IVS4 


-44055 


1 1 474^71 Q 

1 14 ( '-to ( It/ 


c/c 


0 04 

U.U4 


0 04 

U.U4 


0 QO TO fifi-1 9^ 


0 ^1 




IVS4 


-43836 


1 1 474 W38 

1 14 ( L ±OVOO 


A/C 

-TV VJT 


0 01 

U.Ul 


0 01 

U.Ul 


0 Q8 TO 48-1 QQ^i 


0 Q^ 




IVS4 


-43759 


1 1474R01 ^ 

1 14 1 4UU1U 


TV— 
i/ 


0 00 
u.uu 


0 00 
u.uu 






l&'±0\J\JO\JO 


IVS4 


-43743 


1 147480^1 

1 14 ( 4UUO! 


A/T 

rv 1 


0 ^0 


0 4^ 


i 90 n or i 

i.^u v^i.uu— t.OVJ 


0 00^1 

U.UU?J1 




IVS4 


-43705 


1 1 474R0RQ 
1 14 / 4uuuy 




0 001 

U.UUl 


0 004 

U.UU4 


0 40 TO 1 1-1 4Q^i 

U.4U yU.ll — 1.4(7 J 


0 1 7 

U. 1 ( 




IVS4 


-43526 


1 1 474R948 

1 14 ( 41)ij40 


r/T 


0 18 

U. lO 


0 Ti 
u. to 


1 9^ n 0^-1 48^ 


0 010 

U.U1U 




IVS4 


-43522 


1 147489^,9 














IVS4 


-43499 


1 1474f,97f, 

1 14 1 tyJiLi 1 O 


c/c 


0 1^ 
u. to 


0 1 1 

U. 1 1 


1 9Q Tl 07-1 ^7^1 

t.LuO ^l.Ul — l.Ul ^/ 


0 008^. 




IVS4 


-43352 


1 1 474R499 

1 14 ( 4U4liij 


T/r 

1 / V_y 


0 01 

U.Ul 


0 01 

U.Ul 


1 fi^ TO 87-^ 1 ^ 


0 19 

U. tLi 




IVS4 


-43090 


1 1 474fif,84 

1 14 1 tUUOt 


A/O 


0 0^ 


0 04 

U.U4 


1 1 ^ TO 84-1 ^fTl 

l.lu ^U.04 — t.OyJJ 


0 ^8 




IVS4 


-43040§ 


1 1 474fi7^4 

1 14 ( 4U ( 04 


O/A 
vjr/.r\. 


0 001 

U.UUl 


0 004 

U.UU4 


0 TO 1 1-1 ^8^i 
u.oy ^u.ii — t.ooj 


0 14 

U. 14 




IVS4 


-43007 


1 1 474fi7fi7 

1 14 ( 4U ( U i 














IVS4 


-42978 


1 1 474f,7Qf, 

1 14 1 4U 1 t/U 


C/T 
\ji i 


0 18 

U. lO 


0 1 ^ 
u. to 


1 9*. n 04-1 ^ 


0 01Q 

U.Ult/ 




IVS4 


-42945 


1 1 474R89Q 

1 14 ( 4L)0^t/ 


Vjr/ 1 


0 01 

U.Ul 


0 01 

U.Ul 


1 TO 89 9 Qfft 


0 18 

U. lO 




IVS4 


-42705 


1 1 47470RQ 

1 14 ( 4 ( \J\JV 


A/T 
rv 1 


0 009 

U.UU£< 


0 00^ 

U.UUcJ 


0 74 TO 9 1 -9 84^ 

U. ( 4 \^U.^1 — Li.U4 J 


0 fi4 

U.U4 




IVS4 


-42695 


1 1 474707Q 

1 14 i 4 1 \J 1 a 


T/— 
i/ 


0 00 
u.uu 


0 00 
u.uu 








IVS4 


—42470 


1 1 4747^04 

1 14 ( 4 ( uu4 




0 09 


0 09 


1 44 TO QO-9 ^0^ 

1.44 \ \J.O\J — Li.OxJ J 


0 1^ 
u. to 




IVS4 


-42326 


114747448 


G/A 


0.07 


0.07 


0.95 (0.74-1.22) 


0.70 




IVS4 


-42248 


114747526 


G/C 


0.03 


0.03 


0.92 (0.62-1.36) 


0.66 




IVS4 


-42148 


114747626 


A/G 


0.02 


0.01 


1.75 (1.03-2.98) 


0.039 




IVS4 


-42079 


114747695 


A/T 


0.01 


0.01 


1.41 (0.75-2.64) 


0.29 




IVS4 


-41828 


114747946 


C/G 


0.13 


0.13 


0.93 (0.78-1.13) 


0.48 




IVS4 


-41672 


114748102 


C/T 


0.01 


0.01 


1.11 (0.57-2.15) 


0.75 




IVS4 


-41633§ 


114748141 


G/C 


0.01 


0.01 


1.64 (0.72-3.77) 


0.24 


rs7903146*§ 


IVS4 


-41435 


114748339 


C/T 


0.34 


0.28 


1.37 (1.19-1.57) 


6.32 X 10" 6 




IVS4 


-41323§ 


114748451 


G/A 


0.02 


0.03 


0.65 (0.43-0.99) 


0.043 




IVS4 


-41264 


114748510 


C/T 


0.07 


0.07 


0.89 (0.69-1.14) 


0.36 


rs4267006§ 


IVS4 


-41005 


114748769 


G/T 


0.06 


0.05 


1.22 (0.93-1.60) 


0.14 


rs35801464 


IVS4 


-39500 


114750274 


C/T 


0.06 


0.05 


1.32 (1.00-1.75) 
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*Low MAF in 96 samples resulting in poor imputation, tlmputed genotypes. ^Inconsistent with HWE in cases. §Inconsistent with HWE in 
controls. 



presented in Table 3. Allele sizes and frequencies were 
consistent with prior genotyping in samples from African 
populations (15). Evidence of association was observed 
with the protective alleles -4 and 16 (P = 0.043 and 5.02 X 
10" 5 , respectively) and the risk allele 8 (P = 0.022). 
Direct sequence analysis and association analysis of 
the associated genomic interval. The core region of 
association (CIO: 114744078-1 14748339) ± 2 kb was an- 
alyzed by direct sequence analysis in 96 samples (48 type 2 
diabetic cases and 48 controls). A total of 46 SNPs were 
identified, of which 72% (33/46) were novel and 17% (8/46) 



had a MAF >5%. When genotyped in the expanded cohort, 
five of the novel SNPs were found to be monomorphic and 
10 could not be typed on the Sequenom platform because 
of the repetitive nature of the region. These 10 were geno- 
typed via direct sequence analysis on a subset of 96 type 2 
diabetic cases and 96 controls. Four SNPs were common, 
and these sequencing data were paired with existing data 
to be used as the known set of haplotypes for imputation 
in the remaining cohort. In addition, rs6 1875 120 was 
common but yielded poor quality imputation (rsq = 0.31) 
and was therefore genotyped by direct sequence analysis. 
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FIG. 2. A: Haploview-generated LD map of the 40 SNPs identified by direct sequence analysis (C10:l 14742846-1 14750274) in African American 
controls (n = 1,106). Regions of high LD (Z)' = 1 and logarithm of the odds [LOD] >2) are shown in dark red. Markers with lower LD (0.45 < D' < 1 
and LOD >2) are shown in dark through light red, with the color intensity decreasing with decreasing D' value. Regions of low LD and low LOD 
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The remaining five SNPs had low MAF (MAF <0.05) and 
were not genotyped. Table 4 summarizes sequence var- 
iants and association analysis results. Of the 36 SNPs typed 
or imputed, 15 SNPs were found to be nominally asso- 
ciated with type 2 diabetes (admixture-adjusted additive 
P values ranging from 0.050 to 6.32 X 10" 6 ). The most 
striking associations were observed at rs34872471, 
rs35 198068 (imputed), and rs7903146, which were corre- 
lated (r 2 > 0.74; Fig. 2) and associated with disease sus- 
ceptibility (OR = 1.30-1.37) under an additive model. Three 
common haplotypes (CCT, CCC, and TTC) accounted for 
99.6% of all haplotypes. These haplotype frequencies were 
significantly different between type 2 diabetic cases and 
controls (0.342, 0.056, and 0.602, respectively for cases; 

0. 278, 0.058, and 0.664, respectively for controls, omnibus 
test P = 3.7 X 10~ 5 ). Omnibus analysis revealed that the 
haplotype association was lost after adjusting for rs 7903 146 
(P = 0.85), whereas modest association remained by con- 
ditioning rs34872471 or rs35198068 (P = 0.05), suggesting 
that rs7903146 alone is sufficient to explain the overall 
haplotype association. 

DISCUSSION 

This study illustrates the power of genetic analyses in 
African-derived populations to facilitate identification of 
trait-defining variants. TCF7L2 has been identified as one 
of the strongest type 2 diabetes susceptibility genes to date 
with associations across multiple ethnically diverse pop- 
ulations (12,15,16,21,32). Our study is consistent with the 
initial association of SNP rs7903146 in an African Ameri- 
can type 2 diabetic case-control population. By taking 
advantage of reduced LD in the African American pop- 
ulation, we have been able to narrow the critical interval 
for association. This 4.3-kb region, flanked by SNPs 
rs4132115 and rs7903146, was the focus of resequencing in 
an effort to infer which sequence variant(s) are causally 
associated with type 2 diabetes. 

Of the 46 SNPs identified by resequencing, 15 SNPs were 
nominally associated with type 2 diabetes with the most 
significant associations observed at rs34872471, rs35198068 
(imputed), and rs7903146, which were highly correlated 
(r 2 > 0.74; Fig. 1) and associated with disease susceptibility 
(OR = 1.30-1.37). Conditional omnibus haplotype analysis 
suggested that rs7903146 was sufficient to explain the 
haplotype association. This analysis suggests that associa- 
tion at rs34872471 and rs35198068 was the result of corre- 
lation with the true signal from rs7903146. 

Although this study has eliminated the possibility of 
additional common variants (MAF >0.01) contributing 
to type 2 diabetes susceptibility within the fine-mapped 
interval (C10: 114744078-1 14748339 ± 2 kb) of the 
TCF7L2 locus, four variants (IVS3 +42245, IVS3 +42428, 
IVS3 +43487, and IVS4 -43007) were found to have MAF 
<0.01. These variants, which were located in highly re- 
petitive regions, were not evaluated. To date these var- 
iants have not been identified by other ongoing studies, 

1. e., the 1000 Genomes project, suggesting they are private 
mutations. Additionally, given the low MAF, these variants 
are not likely to explain the association observed at the 



TCF7L2 locus, but we cannot rule out the possibility that 
these and other unidentified rare variants contribute to 
disease susceptibility. If this were so, effect sizes of such 
rare variants would have to be in a range unprecedented for 
noncoding variants. 

As a result of fine-mapping the TCF7L2 locus to de- 
termine the region most likely to harbor susceptibility 
variants, the microsatellite marker DG10S478 was ex- 
cluded as the causal variant. DG10S478 is located 41 kb 
proximal to the critical interval defined in the African 
American population and is in weak LD with rs7903146 
(D' = 0.35, r = 0.07). Only a single common allele of 
DG10S478 is nominally associated with type 2 diabetes, 
with the strongest association, which is protective, seen 
with low MAF variants. These data suggest that the con- 
tribution to disease by DG10S478 is nominal. 

This study represents the first comprehensive evaluation 
of variation within the TCF7L2 gene in a large African 
American population. Taking advantage of the LD struc- 
ture in our African-derived sample of African Americans, 
we were able to reduce the genomic interval of association 
to —4.3 kb and exclude the possible contribution of the 
previously identified microsatellite marker to type 2 dia- 
betes susceptibility. Our analysis identified three SNPs, 
rs34872471, rs35198068 (imputed), and rs7903146, which 
were highly associated with type 2 diabetes; all had P values 
that were two orders of magnitude stronger than other 
SNPs. Conditional omnibus haplotype analysis suggested 
that rs7903146 was sufficient to explain the haplotype as- 
sociation. SNP rs7903146 remains the most significantly 
associated variant within the TCF7L2 gene with a calcu- 
lated PAR of 17.4%. 

This investigation has used genetic approaches to focus 
on rs7903146. Alternative explanations can be proposed. 
For example, rs7903146 could be in LD with an unknown 
common variant. We cannot exclude this possibility with 
total confidence, but the assessment of markers in TCF7L2 
by direct genotyping, imputation, and then through the use 
of the 1000 Genomes data using conditional analysis sug- 
gests the likelihood that such a common variant exists is 
low. Evaluation of long range LD on chromosome 10 shows 
little evidence for a remote variant (data not shown). An 
alternative is the possibility that a rare variant of large effect 
in LD with rs7903146 is the actual functional variant. This 
also seems unlikely. Although theoretically possible (33), 
we have recently shown empirically that it is easy to dif- 
ferentiate between a rare functional variant with large effect 
and a common variant in LD (34). 

Thus, fine-mapping at the TCF7L2 locus using an Af- 
rican ancestry population has statistically implicated 
rs7903146 as the causal variant. It is noteworthy that 
Gaulton et al. (20) have implicated rs 7903 146 as a func- 
tional variant by mapping sequence variants to open 
chromatin sites. They found that rs7903146 is located in 
islet-selective open chromatin, and human islet samples 
heterozygous for rs7903146 showed allelic imbalance in 
islet enhancer activity. Thus genetic and functional stud- 
ies make a consistent case for a functional role for 
rs7903146. 



scores (LOD <2) are shown in white. The number within each box indicates the r value. B: Haploview-generated LD map of the 17 common SNPs 
(MAF >0.05) identified by direct sequence analysis (C10:114742846-114750274) in African American controls (n = 1,106). Regions of high LD 
(Z)' = 1 and LOD >2) are shown in dark red. Markers with lower LD (0.45 < D' < 1 and LOD >2) are shown in light red, with the color intensity 
decreasing with decreasing D' value. Regions of low LD and low LOD scores (LOD <2) are shown in white. The number within each box indicates 
the r 2 value. (A high-quality color representation of this figure is available in the online issue.) 
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